Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! #40668

Merged
merged 5 commits into from
May 6, 2021

Conversation

jarlebring
Copy link
Contributor

@jarlebring jarlebring commented Apr 30, 2021

Avoids the use of Inn=Matrix{T}(I,n,n). This reduces the number of
a) matrix-matrix products for nA<=2.1: commit d94b513
b) zero additions for nA>2.1: commit e6860a4

This PR also has some memory allocation improvements by using mul!.

Two CPU-time illustrations
a) nA<2.1

Random.seed!(0);
A=randn(100,100); A=1e-13*A/opnorm(A,1);

Original vs new:

  884.357 μs (31 allocations: 1.07 MiB)
  741.872 μs (21 allocations: 705.69 KiB)

b) nA>2.1

A=randn(50,50);
A=100*A/opnorm(A,1);

Original vs new:

  657.075 μs (43 allocations: 393.97 KiB)
  614.814 μs (45 allocations: 355.81 KiB)

The most important improvement is case a) where the number of matrix-matrix products is reduced.

Solves JuliaLang/LinearAlgebra.jl#840.

@jebej
Copy link
Contributor

jebej commented May 1, 2021

Could you try a larger matrix size for (b)? As the size you tested (n=50) it looks like there are no improvements.

@jarlebring
Copy link
Contributor Author

jarlebring commented May 3, 2021

For case b, in a relative sense the difference is becoming smaller and smaller the bigger the matrix because the computation time is drowning in the matrix-matrix CPU-time. You can see slightly more if you "eliminate" the matrix matrix products.

function exp_test_new(A,A2,A4,A6)
    T=eltype(A);
    CC = T[64764752532480000.,32382376266240000.,7771770303897600.,
           1187353796428800.,  129060195264000.,  10559470521600.,
           670442572800.,      33522128640.,      1323241920.,
           40840800.,           960960.,           16380.,
           182.,                1.]
    Ut = CC[4]*A2
    Ut[diagind(Ut)] .+=  CC[2]
    U  = ((CC[14].*A6 .+ CC[12].*A4 .+ CC[10].*A2) .+
              CC[8].*A6 .+ CC[6].*A4 .+ Ut)
    Vt = CC[3]*A2
    Vt[diagind(Vt)] .+=  CC[1]
    V  =  (CC[13].*A6 .+ CC[11].*A4 .+ CC[9].*A2) .+
        CC[7].*A6 .+ CC[5].*A4 .+ Vt
end


function exp_test_org(A,A2,A4,A6)
    T=eltype(A);
    CC = T[64764752532480000.,32382376266240000.,7771770303897600.,
           1187353796428800.,  129060195264000.,  10559470521600.,
           670442572800.,      33522128640.,      1323241920.,
           40840800.,           960960.,           16380.,
           182.,                1.]

    n=size(A,1);
    Inn=Matrix{eltype(A)}(I,n,n);
    U  = ((CC[14].*A6 .+ CC[12].*A4 .+ CC[10].*A2) .+
              CC[8].*A6 .+ CC[6].*A4 .+ CC[4].*A2 .+ CC[2].*Inn)
    V  = (CC[13].*A6 .+ CC[11].*A4 .+ CC[9].*A2) .+
        CC[7].*A6 .+ CC[5].*A4 .+ CC[3].*A2 .+ CC[1].*Inn
end

We can do timing like this:

julia> n=100;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  165.635 μs (7 allocations: 234.80 KiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  148.215 μs (15 allocations: 314.91 KiB)
julia> n=500;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  4.209 ms (7 allocations: 5.72 MiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  3.813 ms (15 allocations: 7.64 MiB)
julia> n=2000;
julia> A=randn(n,n); A2=A*A; A4=A2*A2; A6=A2*A4;
julia> @btime exp_test_org($A,$A2,$A4,$A6);
  69.824 ms (7 allocations: 91.55 MiB)
julia> @btime exp_test_new($A,$A2,$A4,$A6);
  65.964 ms (15 allocations: 122.10 MiB)

New code seems better for this domain of n. For larger n it seems inconclusive. Case a is the important modification in this PR.

Edit: After incorpating mul! usage. Both a and b are better.

@jarlebring
Copy link
Contributor Author

jarlebring commented May 3, 2021

I think one can save one additional allocation in case b by replacing Vt with an inplace scale of A2

    Vt = A2; # Recycle the A2 memory 
    rmul!(Vt,CC[3])
    Vt[diagind(Vt)] .+=  CC[1]

and then precompute where A2 is used in V. The code does get messy. I'm not convinced it is worthwhile.

@jarlebring jarlebring changed the title Efficiency improvement of exp(::StridedMatrix) with UniformScaling Efficiency improvement of exp(::StridedMatrix) with UniformScaling and mul! May 4, 2021
@dkarrasch
Copy link
Member

BTW, you might get rid of the massive CI failure by rebasing onto current master. There is some SuiteSparse checksums stuff going on. I was seeing all those failures and immediately started searching for typos. 😄

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a nice improvement. Shall we merge?

@jarlebring
Copy link
Contributor Author

Should I squash the commits? I would be happy to do it except my usual procedure with rebase -i HEAD~XX causes some trouble. I get strange git errors originating somehow from SuiteSparse.

@dkarrasch
Copy link
Member

I would have squashed it when merging, so there is no need to do that. As for the many other commits, I do the following procedure: first pull the current master into your local master, then checkout your branch, then

git pull --rebase origin master
git push --force

That will hopefully make all the intermediate commits go away.

@jarlebring
Copy link
Contributor Author

jarlebring commented May 6, 2021

Thanks. 👍 It seems the addition of the other commits spammed other PRs. Apologies. 🙈

@dkarrasch dkarrasch added merge me PR is reviewed. Merge when all tests are passing and removed merge me PR is reviewed. Merge when all tests are passing labels May 6, 2021
@dkarrasch dkarrasch merged commit 15b5143 into JuliaLang:master May 6, 2021
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021
…d mul! (JuliaLang#40668)

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
dghosef pushed a commit to dghosef/julia that referenced this pull request May 11, 2021
…d mul! (JuliaLang#40668)

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
shirodkara pushed a commit to shirodkara/julia that referenced this pull request Jun 9, 2021
…d mul! (JuliaLang#40668)

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021
…d mul! (JuliaLang#40668)

Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants